Chess Neighborhoods, Function Combination, and Reinforcement Learning

نویسندگان

  • Robert Levinson
  • Ryan Weber
چکیده

Over the years, various research projects have attempted to develop a chess program that learns to play well given little prior knowledge beyond the rules of the game. Early on it was recognized that the key would be to adequately represent the relationships between the pieces and to evaluate the strengths or weaknesses of such relationships. As such, representations have developed, including a graph-based model. In this paper we extend the work on graph representation to a precise type of graph that we call a piece or square neighborhood. Specifically, a chessboard is represented as 64 neighborhoods, one for each square. Each neighborhood has a center, and 16 satellites corresponding to the pieces that are immediately close on the 4 diagonals , 2 ranks, 2 files, and 8 knight moves related to the square. Games are played and training values for boards are developed using temporal difference learning, as in other reinforcement learning systems. We then use a 2-layer regression network to learn. At the lower level the values (expected probability of winning) of the neighborhoods are learned and at the top they are combined based on their product and entropy. We report on relevant experiments including a learning experience on the Internet Chess Club (ICC) from which we can estimate a rating for the new program. The level of chess play achieved in a few days of training is comparable to a few months of work on previous systems such as Morph which is described as " one of the best from-scratch game learning systems, perhaps the best " [22].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Temporal Neighborhoods to Adapt Function Approximators in Reinforcement Learning

To avoid the curse of dimensionality, function approximators are used in reinforcement learning to learn value functions for individual states. In order to make better use of computational resources (basis functions) many researchers are investigating ways to adapt the basis functions during the learning process so that they better t the value-function landscape. Here we introduce temporal neig...

متن کامل

Learning to play chess using TD(λ)-learning with database games

In this paper we present some experiments in the training of different evaluation functions for a chess program through reinforcement learning. A neural network is used as the evaluation function of the chess program. Learning occurs by using TD(λ)-learning on the results of high-level database games. Experiments are performed with different classes of features and neural network architectures....

متن کامل

Learning the Piece Values for Three Chess Variants

A set of experiments for learning the values of chess pieces is described for the popular chess variants Crazyhouse Chess, Suicide Chess, and Atomic Chess. We follow an established methodology that relies on reinforcement learning from self-games. We attempt to learn piece values and the piecesquare tables for three chess variants. The piece values arrived at, are quite different from those of ...

متن کامل

Auditory memory function in expert chess players

Background: Chess is a game that involves many aspects of high level cognition such as memory, attention, focus and problem solving. Long term practice of chess can improve cognition performances and behavioral skills. Auditory memory, as a kind of memory, can be influenced by strengthening processes following long term chess playing like other behavioral skills because of common processing pat...

متن کامل

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the ga...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000